Variant Discovery    ◾    155

-downdb \

-webfrom annovar dbnsfp30a humandb/

The database files are downloaded into the specified directory “humandb”. Save the anno-

tation databases of each organism in a separate file.

Not all non-human organisms have annotation databases. In this case, you can build an

annotation database for any organism by yourself. The following steps show how to build a

gene-based annotation database. As an example, we will build an annotation database for

SARS-CoV-2 and we will use it later to annotate the variants called in a previous example.

The following are the steps to build SARS-CoV-2 gene-based annotation database:

1. Download the reference genome sequence of the organism in FASTA format and the

sequence annotation file in GFF/GTF format. For SARS-CoV-2, we can download both

files from the NCBI Genome database at

https://www.ncbi.nlm.nih.gov/genome/86693?genome_assembly_id=757732

Use the following commands to create a directory “sarscov2db” and download the ref-

erence FASTA file and GFF file into it:

mkdir sarscov2db

cd sarscov2db

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/

GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.

fna.gz

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/858/895/

GCF_009858895.2_ASM985889v3/GCF_009858895.2_ASM985889v3_genomic.

gff.gz

Then, decompress the two files with “gunzip” command:

gunzip GCF_009858895.2_ASM985889v3_genomic.fna.gz

gunzip GCF_009858895.2_ASM985889v3_genomic.gff.gz

2. Use the “gff3ToGenePred” tool to convert the GFF file to GenePred file, which is a file

format used to specify the gene track annotations for an imported genome. For GFT for-

mat, use “gtfToGenePred” to convert it into GenePred file. Both “gff3ToGenePred” and

“gtfToGenePred” are ones of the UCSC Genome Browser application binaries built for

standalone command-line use on Linux and UNIX platforms. They can be downloaded by

choosing the right platform at “http://hgdownload.soe.ucsc.edu/admin/exe/”. For the sake

of simplicity, we can download “gff3ToGenePred” in the same “sarscov2db” directory and

use “chmod” to allow it to run as a program:

wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/

gff3ToGenePred

chmod +x gff3ToGenePred

If you wish to download all UCSC Genome Browser binaries, run the following: